Radiomics preoperative-Fistula Risk Score (RAD-FRS) for pancreatoduodenectomy: development and external validation

Abstract Background Accurately predicting the risk of clinically relevant postoperative pancreatic fistula after pancreatoduodenectomy before surgery may assist surgeons in making more informed treatment decisions and improved patient counselling. The aim was to evaluate the predictive accuracy of a radiomics-based preoperative-Fistula Risk Score (RAD-FRS) for clinically relevant postoperative pancreatic fistula. Methods Radiomic features were derived from preoperative CT scans from adult patients after pancreatoduodenectomy at a single centre in the Netherlands (Amsterdam, 2013–2018) to develop the radiomics-based preoperative-Fistula Risk Score. Extracted radiomic features were analysed with four machine learning classifiers. The model was externally validated in a single centre in Italy (Verona, 2020–2021). The radiomics-based preoperative-Fistula Risk Score was compared with the Fistula Risk Score and the updated alternative Fistula Risk Score. Results Overall, 359 patients underwent a pancreatoduodenectomy, of whom 89 (25 per cent) developed a clinically relevant postoperative pancreatic fistula. The radiomics-based preoperative-Fistula Risk Score model was developed using CT scans of 118 patients, of which three radiomic features were included in the random forest model, and externally validated in 57 patients. The model performed well with an area under the curve of 0.90 (95 per cent c.i. 0.71 to 0.99) and 0.81 (95 per cent c.i. 0.69 to 0.92) in the Amsterdam test set and Verona data set respectively. The radiomics-based preoperative-Fistula Risk Score performed similarly to the Fistula Risk Score (area under the curve 0.79) and updated alternative Fistula Risk Score (area under the curve 0.79). Conclusion The radiomics-based preoperative-Fistula Risk Score, which uses only preoperative CT features, is a new and promising radiomics-based score that has the potential to be integrated with hospital CT report systems and improve patient counselling before surgery. The model with underlying code is readily available via www.pancreascalculator.com and www.github.com/PHAIR-Consortium/POPF-predictor.


Introduction
Clinically relevant postoperative pancreatic fistula (CR-POPF) is a feared complication following pancreatoduodenectomy that negatively impacts short-and long-term outcomes 1,2 .Accurate risk stratification of CR-POPF in the preoperative setting can assist in determining the best surgical approach for high-risk or frail patients.In the perioperative interval, high-risk patients for CR-POPF may be candidates for prophylactic treatment such as somatostatin analogues 3 .In patients with cystic lesions of the pancreatic head, accurate risk stratification of CR-POPF can help decide whether to proceed with a pancreatoduodenectomy or consider alternative approaches 4 .A recent study demonstrated improved postoperative outcomes for high-risk patients who underwent a total pancreatectomy compared with those who underwent a pancreatoduodenectomy 5 .
Previous research has introduced several CR-POPF prediction models for pancreatoduodenectomy, including the Fistula Risk Score (FRS) and the updated alternative Fistula Risk Score (ua-FRS) 6,7 .These risk models are commonly used but have limitations, including their reliance on subjective intraoperative

Patients
For model design, adult patients after pancreatoduodenectomy in one of the two locations of the Amsterdam UMC (Vrije Universiteit Medical Center) were included from the Dutch Pancreatic Cancer Audit (January 2013-December 2018).Exclusion criteria were patients with a poor quality CT scan or CTs with slice thickness >3.0 mm.A poor quality CT scan was defined as a poor scan due to artefacts (for example respiratory motion artefacts).For external validation of the model, adult patients after pancreatoduodenectomy in the Verona University Hospital were included (January 2020-January 2021).The same eligibility criteria were applied.These data sets will be referred to as the Amsterdam and Verona data sets respectively.

Data acquisition and outcome
The following patient demographics and tumour characteristics were retrospectively obtained from electronic health records: age (years), sex, BMI (kg/m 2 ), ASA classification, pre-existing diabetes, application of neoadjuvant therapy, surgical approach (that is open/laparoscopic/robotic surgery), pancreatic duct size, pancreatic texture, intraoperative blood loss, length of surgical procedure, pathology.The pancreatic duct size was measured intraoperatively with a ruler.The pancreatic texture was assessed subjectively by the surgeon feeling intraoperatively and in minimally invasive surgery, the texture of the resected pancreatoduodenectomy specimen was determined.Contrast-enhanced CT scans were obtained from the picture archiving and communication systems.The primary outcome was a clinically relevant POPF, defined as grade B or C according to the 2016 International Study Group on Pancreatic Surgery (ISGPS) definition 15 .

Data sets
Two full data sets (the Amsterdam and Verona data sets) were used.The Amsterdam data set was further divided into three subdata sets as follows.First, the Amsterdam data set was divided into two sets: the Amsterdam development set, comprising 90 per cent of the Amsterdam data set, and the Amsterdam test set, comprising the remaining 10 per cent of the data.Within the Amsterdam development set, there was a further division, creating Amsterdam training and validation sets using five-fold cross-validation.The models were trained on the Amsterdam development set and evaluated on both the Amsterdam test set and the Verona data set.The data splits of the Amsterdam and Verona data sets are visualized in Fig. 1.

Image segmentation and radiomic analysis
We constructed a radiomics workflow consisting of the segmentation of the volume of interest (VOI) and radiomic feature extraction, feature selection and model construction on the Amsterdam development set, and performance evaluation on the Amsterdam test set and Verona data set.
A PhD candidate specializing in the field of radiology and surgery (E.I.) and a resident in radiology (I.V.) manually pre-segmented the VOI in either the (late) arterial or portal venous phase using 3D slicer version 4.11.20210226(www.slicer.org) 16.Subsequently, an experienced abdominal radiologist (Y.N. or F.S. in the Amsterdam data set and R.R. in the Verona data set), who was blinded to the patient's outcome, finalized the segmentation.The VOI represented the pancreatic remnant, segmented distal/lateral (that is towards the tail) from the midline of the superior mesenteric vein, considering that all patients underwent a pancreatoduodenectomy.The segmentation encompassed both pancreatic tissue and the pancreatic duct.The surrounding vessels (that is splenic artery, splenic vein, superior mesenteric vein and portal vein) were segmented to prevent their inclusion in the pancreatic segmentation mask but were not part of the final VOI (Fig. 2).If both the arterial and portal venous phases were available, the pancreas was segmented in the (late) arterial phase.If the arterial phase was unavailable, the pancreas was segmented in the portal venous phase.Radiomic features were extracted from each VOI of patients in the Amsterdam data set using PyRadiomics version 3.0 (http://github.com/radiomics/pyrdiomics) 17 .Subsequently, 25 Amsterdam development test splits were created for final model selection.For each split, 90 per cent of the Amsterdam data set was allocated to training and 10 per cent to testing.The features of each Amsterdam development and test set were normalized independently using the MinMax scaler.Two feature reduction methods were serially applied to the Amsterdam development sets to select the most predictive features for each split: removing of features with a variance of less than 0.001 and with an importance near zero using least absolute shrinkage and selection operator (LASSO) feature selection.
Four machine-learning (ML) classifiers were fitted to the Amsterdam development sets: support vector machine, logistic regression, k-nearest neighbour and random forest.Before training, the minority classes in the Amsterdam development sets were oversampled with a sampling strategy of 0.9, and subsequently, the majority classes were undersampled.To create more equally sized groups based on the presence or absence of CR-POPF, the data was manually undersampled.Finally, random Gaussian noise with a mean of zero and a variance of 0.2 was added to the Amsterdam development sets.A five-fold cross-validation grid search optimizing area under the curvereceiver operating characteristic (AUROC) was used to find the hyperparameters and fit all four ML classifiers on the Amsterdam development sets.All four ML classifiers were optimized, fitted and evaluated on all 25 Amsterdam development-test splits.The best-performing ML classifier with regards to AUROC on the Amsterdam test set was validated on the Verona data set.The performance of this model on both the Amsterdam test set and Verona data set was evaluated using the AUROC, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and calibration plots.

Statistical analysis
Statistical analysis was conducted using Python version 3.7 (Python Software Foundation, Wilmington, DE, USA) and R-Studio version 2022.07.1 (R Studio Team (2018), RStudio: Integrated Development for R. RStudio Inc., Vienna, Austria).The AUROC of the RAD-FRS, FRS and ua-FRS was reported with a 95 per cent confidence interval (95 per cent c.i.).The maximum Youden's J value from the AUROC was used to identify the cut probability where prediction discrimination was maximum according to sensitivity and specificity.The AUROC of the RAD-FRS was compared with the AUROC of the FRS and ua-FRS in the Verona data set using DeLong's test.Calibration curves were used to compare the observed and estimated probability of the models.A P value <0.050 was considered statistically significant.Continuous variables were reported as mean with a standard deviation (s.d.) or median with an interquartile range (i.q.r.) if the distribution was skewed.Dichotomous, ordinal, and nominal variables are presented as numbers and percentages.

Results
Overall, 224 patients underwent a pancreatoduodenectomy at UMC Amsterdam in the Netherlands and 135 at the Verona University Hospital in Italy.A total of 89 (25 per cent) developed CR-POPF.In the Amsterdam data set, a total of 106 patients were excluded: 100 patients without CR-POPF were randomly excluded for the analysis as a result of undersampling, three patients were excluded because of a poor quality CT scan, and three did not have slices ≤3.0 mm, resulting in 118 patients that were included in the Amsterdam data set.Of those, 50 developed CR-POPF (Fig. 3).In the Verona data set, 135 patients underwent a pancreatoduodenectomy between January 2020 and January 2021, with CR-POPF occurring in 22 patients (Fig. 4).
As a result of undersampling, 60 patients without CR-POPF were randomly excluded from the analysis.Another 18 patients were excluded, 6 due to poor CT scan quality, and 12 did not have slices ≤3.0 mm.In total, 57 patients were included in the Verona data set, of whom 22 developed CR-POPF.

Characteristics of the Amsterdam and Verona data sets
The clinical characteristics of both the Amsterdam and Verona data sets are shown in Table 1.The data sets differed in sex (36 per cent female in the Amsterdam data set versus 45 per cent in the Verona  data set), the application of neoadjuvant therapy (14 per cent in the Amsterdam data set versus 58 per cent in the Verona data set), surgical approach (15 per cent of patients in the Amsterdam data set underwent a laparoscopic pancreatoduodenectomy versus 0 per cent in the Verona data set), and application of a prophylactic somatostatin analogue (15.3 per cent in the Amsterdam data set versus 31.2 per cent in the Verona data set).The Amsterdam data set comprised 118 patients with a median age of 68 years (i.q.r.59-79 years) and a mean BMI of 25.7 kg/m 2 (s.d.3.9 kg/m 2 ).The Verona data set included 57 patients with a median age of 68 years (i.q.r.61-75) and a mean BMI of 25 kg/m 2 (s.d.4.1 kg/m 2 ).Reconstruction and acquisition parameters used to obtain the CT scans of the Amsterdam and Verona data sets are listed in Table S1.

Amsterdam data set
In the Amsterdam data set, 103 (87 per cent) CT scans were segmented in the (late) arterial phase and 15 (13 per cent) in the portal venous phase.A total of 120 radiomic features were extracted.These features are listed in Table S2.curve of the Amsterdam test set, see Fig. 5.The results of the other three machine learning models are listed in Table 2.

Verona data set
In the Verona data set, 52 (91 per cent) CT scans were segmented in the (late) arterial phase and 5 (9.1 per cent) in the portal venous phase.The AUC of the random forest model on the Verona data set (n = 57) was 0.81 (95 per cent c.i. 0.69 to 0.92), with a sensitivity of 0.96, specificity of 0.66, PPV of 0.64 and NPV of 0.96.The calibration curve indicated that the model's performance was consistent with the observed data in the Verona data set and demonstrated an improvement in the consistency between estimated and observed probabilities compared with the Amsterdam test set (Fig. 5).

Fig. 2 Fig. 1
Fig. 2 An example of a contrast-enhanced CT scan in the early arterial phase of the abdomen, showing the contoured pancreatic tissue (pink), pancreatic duct (light blue), splenic artery (red) and the superior mesenteric vein (dark blue) in an axial image The vertical yellow line indicates the midline of the superior mesenteric vein, with the pancreas annotated on the left side.The volume of interest consisted of the pancreatic tissue and pancreatic duct.CT, computed tomography.

2 Total PD 2013-2018 n = 224 Fig. 3
Fig. 3 Flow chart of Amsterdam data setFlow chart showing the selection process of the Amsterdam data set.A total of 100 patients were excluded in the group without CR-POPF to create more equally sized groups based on the presence or absence of CR-POPF.Further exclusion criteria were: poor image quality (n = 3) and slice thickness above 3 mm (n = 3).A total of 50 patients with CR-POPF and 68 without CR-POPF were found eligible for analysis.PD, pancreatoduodenectomy; CR-POPF, clinically relevant postoperative pancreatic fistula; CT, computed tomography.

5 10 Fig. 4
Fig. 4 Flow chart of the Verona data setFlow chart showing the selection process of the Verona data set.A total of 60 patients were excluded in the group without CR-POPF to create more equally sized groups based on the presence or absence of CR-POPF.Further exclusion criteria were: poor image quality (n = 6) and slice thickness above 3 mm (n = 12).A total of 22 patients with CR-POPF and 35 without CR-POPF were found eligible for analysis.PD, pancreatoduodenectomy; CR-POPF, clinically relevant postoperative pancreatic fistula; CT, computed tomography.

Table 2 Predictive performances of the four machine learning models on the Amsterdam test set
The voxel volume feature corresponds to the volume of the VOI and thus the remnant of the pancreas volume.The minor axis length yields the second largest axis length of the VOI and could correlate to the pancreatic thickness.The random forest model performed best with an area under the curve (AUC) of 0.90 (95 per cent c.i. 0.71 to 0.99) in the Amsterdam test set (n = 12).This model had a sensitivity of 1.00, specificity of 0.67, PPV of 0.71 and NPV of 1.00 in the Amsterdam test set.For the calibration

A calibration plot of the RAD-FRS in the Amsterdam test set and Verona data set
The black dots represent the quintiles of the observed probabilities by quintiles of the predicted probabilities of the Amsterdam test set, while the white triangles represent the same for the Verona data set.The dashed line represents the ideal performance of the score.RAD-FRS, radiomics preoperative-Fistula Risk Score.

Table 3 Comparison of the performance of the RAD-FRS model with other published studies that have used radiomic features derived from preoperative CT scans to predict the occurrence of CR-POPF
CR-POPF, clinically relevant postoperative pancreatic fistula; CT, computed tomography; RAD-FRS, radiomics preoperative-Fistula Risk Score; AUROC, area under the curve-receiver operating characteristics.